bloom filter bulk inserts and queries #723

dcoutts · 2025-05-15T00:24:22Z

Two changes, both from bulk operations for bloom filters.

For Bloom filter inserts in run accumulation: instead of inserting keys into the bloom filter one by one as they are added to the run accumulator, save them up and add them all in one go when the page is being finalised. This then lets us use a bloom filter bulk insert, which lets us use memory prefetching.

For Bloom filter queries in key lookups, update the existing bulk query code to properly take advantage of the new API and use prefetching. We can now also simplify and use a single high performance implementation, rather than needing two (a more compatible one and a faster one that relied on fancier features available in later GHC versions).

Results for the WP8 benchmark (100M elements, 10bits per key, full caching, 10k batches of 256 keys):

baseline: 88,871 ops/sec
with bulk inserts: 97,152 ops/sec, ~9% improvement
with bulk queries: 103,005 ops/sec, ~6% additional, ~16% cumulative improvement

So overall about a 16% improvement in ops/sec on the primary WP8 benchmark, and as a bonus, getting over the magic 100k ops/sec threshold (on my laptop).

jorisdral

Some initial comments. I have yet to look at the bulk insert code

src/Database/LSMTree/Internal/RunAcc.hs

test/Test/Database/LSMTree/Internal/BloomFilter.hs

src/Database/LSMTree/Internal/RunAcc.hs

jorisdral

LGTM! If we fix the test failure that I noted in my previous review, then we're good to go

bloomfilter/src/Data/BloomFilter/Blocked/BitArray.hs

src/Database/LSMTree/Internal/BloomFilter.hs

test/Test/Database/LSMTree/Internal/BloomFilter.hs

dcoutts · 2025-05-15T10:29:45Z

CI failure due to flaky test fixed in #724.

jorisdral

LGTM with the new bulk query changes

bloomfilter/src/Data/BloomFilter/Blocked.hs

test/Test/Database/LSMTree/Internal/BloomFilter.hs

jorisdral · 2025-05-19T09:01:33Z

src/Database/LSMTree/Internal/BloomFilterQuery1.hs

+        loop2_prefetch :: KeyIx -> ST s ()
+        loop2_prefetch !kix
+          | kix == ksN = pure ()
+          | otherwise  = do
+              let !keyhash = P.indexPrimArray keyhashes kix
+              Bloom.prefetchElem filter keyhash
+              loop2_prefetch (kix+1)


The commit message for 8eaf7ce says that the prefetch distance is the number of runs, but this looks like the prefetch distance is the number of keys

Yes, oops. I kept changing my mind. I did both versions and benchmarked them.

Updated the commit message.

So in the benchmarks I did, nesting the loops this way round was slightly faster. But arguably, the other way round should have more stable prefetching behaviour, since it would depend on number of runs and not the lookup batch size.

Switching it round is not that hard. I could still do that. Thoughts?

I've switched it round.

src/Database/LSMTree/Internal/BloomFilterQuery1.hs

bench/macro/lsm-tree-bench-bloomfilter.hs

Instead of inserting keys into the bloom filter one by one as they are added to the run accumulator, save them up and add them all in one go when the page is being finalised. This then lets us use a bloom filter bulk insert, which lets us use memory prefetching. The result should be faster.

Fetch into the caches in the least intrusive way, "0" levels, rather than "3" levesl. This does not appear to slow down inserts, and should evict fewer things from the caches. And document what level "0" means and why we use it.

`bloomQueriesModel` was not really a proper model, because the model itself was using actual bloom filters. The model is now instead a `Set`, and `prop_bloomQueriesModel` is updated because the model will now only return true positives and negatives.

Previously, for the Classic Bloom filter implementation we had two different implementations of bloomQueries: one that was relatively simple and didn't rely on anything fancy, and one that went all out to maximise performance. The high performance one had to be disabled when we added the block-structured bloom filter implementation, since it was tightly coupled to that implementation. With the new bloom filter API and implementation, we can now implement a single high performance version of bulk query. We no longer need separate higher and lower performance versions, since we no longer need to rely on fancy features like unlifted boxed arrays. So strip out the bloom-query-fast cabal flag and the BloomFilterQuery2 module. The updated BloomFilterQuery1 does a simple nested loop: iterating over all filters and within that over all keys. It does two loops over the keys: one to prefetch the key for the filter, and a second one to do the lookups for real. Thus the prefetch distance is the number of keys, which is of course somewhat variable. In benchmarks this version was slightly faster than nesting the loops the other way around, but perhaps the other way around would have more stable prefetching behaviour over a wider range of key batch sizes. Changing this would not be too hard.

The query module used to be big, and there used to be two of them. Now there's only one and it's a lot smaller. So it makes sense to keep it all together in one module.

We were using a mix of import qualified as BF, and import as Bloom.

This means our prefetching is done with a prefetch distance of the number of runs, which is more stable than the key batch size. Also tweak the code so we get only joinrec and not letrec in the core.

dcoutts requested review from jorisdral, mheinzel, recursion-ninja and wenkokke as code owners May 15, 2025 00:24

jorisdral requested changes May 15, 2025

View reviewed changes

src/Database/LSMTree/Internal/RunAcc.hs Show resolved Hide resolved

src/Database/LSMTree/Internal/RunAcc.hs Show resolved Hide resolved

test/Test/Database/LSMTree/Internal/BloomFilter.hs Outdated Show resolved Hide resolved

jorisdral reviewed May 15, 2025

View reviewed changes

src/Database/LSMTree/Internal/RunAcc.hs Show resolved Hide resolved

jorisdral approved these changes May 15, 2025

View reviewed changes

dcoutts force-pushed the dcoutts/bloomfilter-blocked branch 3 times, most recently from 09afe55 to 49f8071 Compare May 16, 2025 00:06

dcoutts changed the title ~~Bulk insert for bloom filters in RunAcc~~ bloom filter bulk inserts and queries May 16, 2025

jorisdral approved these changes May 19, 2025

View reviewed changes

dcoutts and others added 3 commits May 20, 2025 10:07

Tweak prefetching for bloom filter inserts

e540a8f

Fetch into the caches in the least intrusive way, "0" levels, rather than "3" levesl. This does not appear to slow down inserts, and should evict fewer things from the caches. And document what level "0" means and why we use it.

dcoutts force-pushed the dcoutts/bloomfilter-blocked branch from 49f8071 to 6dac4ec Compare May 20, 2025 09:13

dcoutts added 5 commits May 20, 2025 10:45

Merge BloomFilterQuery1 module into BloomFilter module

fdfa10d

The query module used to be big, and there used to be two of them. Now there's only one and it's a lot smaller. So it makes sense to keep it all together in one module.

Be consistent in names for importing bloom filter modules

4d86ea2

We were using a mix of import qualified as BF, and import as Bloom.

FIXUP: Merge BloomFilterQuery1 module into BloomFilter module

3c809a4

FIXUP: {-# INLINABLE insertMany #-}

2bbaa69

dcoutts force-pushed the dcoutts/bloomfilter-blocked branch from 6dac4ec to ef64262 Compare May 20, 2025 12:19

FIXUP: use nested loops with iterating over filters as inner loop

a02e702

This means our prefetching is done with a prefetch distance of the number of runs, which is more stable than the key batch size. Also tweak the code so we get only joinrec and not letrec in the core.

dcoutts force-pushed the dcoutts/bloomfilter-blocked branch from ef64262 to a02e702 Compare May 20, 2025 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bloom filter bulk inserts and queries #723

bloom filter bulk inserts and queries #723

dcoutts commented May 15, 2025 •

edited

Loading

jorisdral left a comment •

edited

Loading

jorisdral left a comment

dcoutts commented May 15, 2025

jorisdral left a comment

jorisdral May 19, 2025

dcoutts May 20, 2025

dcoutts May 20, 2025 •

edited

Loading

dcoutts May 20, 2025

bloom filter bulk inserts and queries #723

Are you sure you want to change the base?

bloom filter bulk inserts and queries #723

Conversation

dcoutts commented May 15, 2025 • edited Loading

jorisdral left a comment • edited Loading

Choose a reason for hiding this comment

jorisdral left a comment

Choose a reason for hiding this comment

dcoutts commented May 15, 2025

jorisdral left a comment

Choose a reason for hiding this comment

jorisdral May 19, 2025

Choose a reason for hiding this comment

dcoutts May 20, 2025

Choose a reason for hiding this comment

dcoutts May 20, 2025 • edited Loading

Choose a reason for hiding this comment

dcoutts May 20, 2025

Choose a reason for hiding this comment

dcoutts commented May 15, 2025 •

edited

Loading

jorisdral left a comment •

edited

Loading

dcoutts May 20, 2025 •

edited

Loading